Democratizing Access to Education Data

The Urban Institute’s Education Data Portal

Erika Tyagi

The Education Data Portal bridges the gap between data availability and data accessibility.

  1. What do I mean by the availability-accessibility gap?
  2. How does the portal bridge this gap?
  3. Why does bridging this gap matter?

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry
  • And inconsistent rows and columns

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry
  • And inconsistent rows and columns
  • And the occasional coffee spill

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry
  • And inconsistent rows and columns
  • And the occasional coffee spill

Accessible to whom?

How does the portal bridge this gap?

  • Provides a one-stop-shop for 100+ datasets released by government agencies and other institutions on schools, school districts, and colleges in the U.S.
  • Includes harmonized data and metadata for each dataset
  • Makes it easier for users to look at trends over time and combine data from different sources

How does the portal bridge this gap?

Example: How has tuition at my alma mater changed?

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation
  • Remember to repeat the process again next year

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

This is tedious, error-prone, and simply not fun.

Using the portal R package

Example: How has tuition at my alma mater changed?

library(educationdata)

# Get data 
data <- get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = list(
    year = c(1990:2020), 
    unitid = "173258", 
    tuition_type = "4"
  )
)

# Plot data 
data %>%
  ggplot(aes(x = year, y = tuition_fees_ft)) +
  geom_line()

Using the portal Python package

Example: How has tuition at my alma mater changed?

import educationdata 

# Get data 
data = get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = {
    "year": range(1990, 2020), 
    "unitid": "173258", 
    "tuition_type": "4" 
  }
)

# Plot data 
data.plot.line(
  x = "year", y = "tuition_fees_ft"
)

Using the portal Stata package

Example: How has tuition at my alma mater changed?

* Get data 
educationdata using ///
  "college ipeds academic-year-tuition", sub( ///
  year=1990/2020 ///
  unitid=173258 ///
  tuition_type=4 ///
)

* Plot data 
twoway (line tuition_fees_ft year)







Using the portal Data Explorer

Example: How has tuition at my alma mater changed?

TODO: Add video

Why do I think the portal bridges this gap so effectively?

  1. By focusing on the underlying API
  2. By focusing on data documentation

The underlying API

  • 120+ data endpoints
    (with the data)
  • 12+ metadata endpoints (about the data)
  • All other tools, packages, and documentation are built on these endpoints

Data documentation

  • Considered a
    first-order priority
  • For humans and machines
  • With details on demand

Data documentation

  • Considered a
    first-order priority
  • For humans and machines
  • With details on demand

Data documentation

  • Considered a
    first-order priority
  • For humans and machines
  • With details on demand

Why do I think the portal bridges this gap so effectively?

By focusing on the underlying API and data documentation

Why does bridging this gap matter?

Different people ask different—and important—questions.

Get in touch